Capturing Global Informativeness in Open Domain Keyphrase Extraction

نویسندگان

چکیده

Open-domain KeyPhrase Extraction (KPE) aims to extract keyphrases from documents without domain or quality restrictions, e.g., web pages with variant domains and qualities. Recently, neural methods have shown promising results in many KPE tasks due their powerful capacity for modeling contextual semantics of the given documents. However, we empirically show that most prefer good phraseness, such as short entity-style n-grams, instead globally informative open-domain This paper presents JointKPE, an architecture built on pre-trained language models, which can capture both local phraseness global informativeness when extracting keyphrases. JointKPE learns rank by estimating entire document is jointly trained keyphrase chunking task guarantee candidates. Experiments two large datasets diverse domains, OpenKP KP20k, demonstrate effectiveness different variants scenarios. Further analyses reveal significant advantages predicting long non-entity keyphrases, are challenging previous methods. Our code publicly available at https://github.com/thunlp/BERT-KPE.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Domain-Specific Keyphrase Extraction

Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to existing documents is very laborious. Therefore it is highly desirable to automate the keyphrase extraction process. This paper shows that a simple procedure for keyphrase extraction based on the naive...

متن کامل

Domain-speciic Keyphrase Extraction

Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to existing documents is very laborious. Therefore it is highly desirable to automate the keyphrase extraction process. This paper shows that a simple procedure for keyphrase extraction based on the naive...

متن کامل

Domain - Speci c Keyphrase Extraction

Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to existing documents is very laborious. Therefore it is highly desirable to automate the keyphrase extraction process. This paper shows that a simple procedure for keyphrase extraction based on the naive...

متن کامل

A New Domain Independent Keyphrase Extraction System

In this paper we present a keyphrase extraction system that can extract potential phrases from a single document in an unsupervised, domain-independent way. We extract word n-grams from input document. We incorporate linguistic knowledge (i.e., part-of-speech tags), and statistical information (i.e., frequency, position, lifespan) of each n-gram in defining candidate phrases and their respectiv...

متن کامل

Pke: an Open Source Python-based Keyphrase Extraction Toolkit

We describe pke, an open source python-based keyphrase extraction toolkit. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extented to develop new approaches. pke also allows for easy benchmarking of state-of-the-art keyphrase extraction approaches, and ships with supervised models trained on the SemEval-2010 dataset (Kim et al., 2010).

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-88483-3_21